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Abstract 

Single fault sequential change point problems have become important in modeling for various 
phenomena in large distributed systems, such as sensor networks. But such systems in many situations 
present multiple interacting faults. For example, individual sensors in a network may fail and detection is 
performed by comparing measurements between sensors, resulting in statistical dependency among faults. 
We present a new formulation for multiple interacting faults in a distributed system. The formulation 
includes specifications of how individual subsystems composing the large system may fail, the information 
that can be shared among these subsystems and the interaction pattern between faults. We then specify 
a new sequential algorithm for detecting these faults. The main feature of the algorithm is that it uses 
composite stopping rules for a subsystem that depend on the decision of other subsystems. We provide 
asymptotic false alarm and detection delay analysis for this algorithm in the Bayesian setting and show 
that under certain conditions the algorithm is optimal. The analysis methodology relies on novel detailed 
comparison techniques between stopping times. We validate the approach with some simulations. 

I. Introduction 

Sequential change point detection problems have been widely studied [12] when involving a single fault 
or multiple hypothesis based on a single change. New large distributed systems exhibit fault behaviors that 
required modeling of multiple correlated faults [4]. For example, in a sensor network each sensor can fail 
independently of each other, and the correlation between pairs of sensors can be used for diagnosis (e.g. 
see [21]). The faults are interacting since a fault in any pair of sensors causes a change in the correlation 
between them. In this paper we are concerned with the problem of detecting multiple interacting faults. 
This requires a new formulation that differs from the single fault problem. 

Single faults. Classic sequential change point detection [12] is concerned with variations on the following 
basic problem: given a sequence of random observations {X^, k > 1}, such that is distributed with 
density /o (i.e. X^ ~ /o) if k < A and X^ ~ /i if k > A for a random change time A ~ tt, find a procedure 
v that detects and stops at time n if A < n on the basis of the observations F n {X) = {X^, 1 < k < n}. 
The change behavior can be compactly denoted by /o — > f\. Various solutions have been proposed for 
this problem, such as the CUSUM [18] and the Shiryaev-Roberts-Pollak (SRP) [22], [23] procedures. 
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The gist of these approaches is a threshold test for the likelihood ratio at time n 



s , Vl <n\F n {X)) 



when using the complete Bayesian model. The threshold B a is chosen to satisfy a false alarm constraint 
P(p < A) < a. The procedure can be defined as the stopping time v such that 

v = inf{n : A n (X) > B a }, (2) 

and its performance is measured by the m-moment of detection delay 

Di(u)=E x [(P-Xr\u>\), 

where E> denotes expectation with respect to the prior of A and typically m = 1 or m = 2. Asymptotic 
performance of single change point procedures in this and other performance criteria have been extensively 
analyzed (e.g. [2], [11], [13], [20], [25]). In particular, Tartakovsky et al. [25] show the asymptotic delay 
optimality of the SRP rule under diminishing false alarm probability and threshold 

„ 1 — a 

B a = (3) 

a 



is 

I l°g ot\ 



(4) 



[qi{X) + d 

where = denotes asymptotic upper and lower bounds with respect to a — > 0. The delay is only a function 
of the false alarm P(A < v) < a, the amount of information q(X) in the densities /q and f\ and the tail 
exponent d of the prior for A: 

qi (X) = [ Mx^og^fl^dx). 
J fo{x) 

Dm(i>) is also the minimum asymptotic delay achievable by any procedure with false alarm a. The single 
change point model captures problems of fault diagnosis, where the measured data is fully observed and 
the change in the measurements is attribute to a single fault happening at a random time. 



Multiple simultaneous interacting faults can happen in a complex system with multiple interacting 
components. Consider the system in Figure 1(a). Each node in Figure 1(a) is a subsystem and each 
edge represents information shared between subsystems. There are multiple subsystems, u\ to U5, each 
of which can fail at random times Ai to A5. A sequence of observations X n (ui) is collected at each 
subsystem i. When subsystem Uj fails, the sequence X n (ui) experiences a change. Since this sequence is 
only collected by an individual subsystem its denominated private information of subsystem Uj. Moreover 
subsystems m and Uj also collect a shared sequences of observations X n (m , Uj ) that is influenced by 
failures in either subsystem. These sequences are denominated shared information between subsystems 
Ui and Uj. Since the graph in Figure 1(a) specifies the pattern of information sharing among subsystems 
we denominate it communication graph. Information could be shared by more than two subsystems, and 
would be represented by a hyper-edge connecting multiple nodes in the graph. 

Solving the multiple interacting fault detection problem requires creating a test for each subsystem 
to detect its own failure using only the local information it collects, namely the private and shared 
information available to it. Each subsystem could use its private information sequence X n {ui) and the 
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(a) (b) 

Fig. 1. (a) Communication graph: information sharing graph of a system with multiple subsystems (nodes) with edges between 
nodes indicating shared information and (b) Fault graph: statistical dependency between information and failure times. 

SRP rule in Eq. (2) to obtain a stopping time to detect its own fault. But clearly this procedure does not 
use and benefit from the shared information at all. 

The shared information can only be used if some structure on how faults interact with shared infor- 
mation is specified. The fault graph (Figure 1(b)) displays graphically the statistical dependency between 
shared information variables X n (ui,Uj) and faults Aj and Xj. It is natural for many practical situations to 
assume that when either subsystem m or Uj fails, the shared observation sequence experiences a change 
in distribution. Furthermore, after one of the subsystems has failed, the shared information relating to that 
subsystem becomes useless to detect a fault on the other subsystems using the same shared information. 
Therefore, the earliest of the fault times Xi and Xj drives a change in the distribution of X n (ui,Uj). In 
general situations, alternative functional behaviors could be specified. 

The interaction of faults in shared information makes it very challenging to use this information in a test 
for a subsystem. For example, a very naive test that only used a single sequence X n (ui,Uj) to diagnose 
subsystem Ui would be driven to an incorrect decision if subsystem Uj fails long before subsystem Ui 
fails. Thus the integration of weak evidence to build an effective detection procedure is required. 

One useful practical application of the stochastic model we discuss is in detecting faulty sensors in 
a sensor network measuring a slowly varying spatial and temporal process. Each individual sensor in a 
network can fail at some random unknown time. The nature of failure is such that plausible measurements 
are still reported. Sensors deployed geographically near each other compare their information to determine 
whether they are failed or not. Before failure, measurements maintain some degree of similarity due to 
the slow varying nature of the phenomena being measured, and after failure this similarity is signifi- 
cantly reduced. In our current setup, each sensor is a subsystem. The private information are similarity 
comparisons to a sensor's own past measurements or to some reference working sensor. The shared 
information are similarity comparisons of a sensor to nearby sensors. Various studies [9] have proposed 
ad-hoc and empirical approaches to this problem, but to the best of our knowledge, no systematic theory 
has been presented. Empirical validation and the implementation details of a solution for this problem 
in the context of applications can be found in [3], [21], [29]. 
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Fig. 2. Interacting subsystems setup: (a) communication graph and (b) fault graph. Information set at time n are denoted X, 
Y„ and Z n . 



A. Our contributions 

Consider the setup in Figure 2. Two subsystems, u\ = 1 and u 2 = 2 fail at times Ai and A2 respectively. 
The subsystems observe variables reflective of their state according to the communication graph in in 
Figure 2(a). Subsystem 1 observes a private sequence X that changes its distribution according to Ai and 
subsystem 2 observes a private sequence Y that changes according to A2. Both subsystems observe the 
shared information sequence Z, whose behavior changes according to the earliest between both failure 
times. All the observations are independent conditional on the change times. In this paper we will explore 
the construction of fault detection rules for each subsystem that can effectively use private and shared 
information. The single fault detection problem in this scenario corresponds to subsystem 1 using only its 
private information to detect failure X\. The interacting fault problem involving two subsystems presents 
substantial analytic challenge due to the information constraints and the nature of shared information. 

The first natural solution to the problem consists of each subsystem using only its private information 
to make a decision about its state. In this case, the single fault SRP procedure (Eq. 2) can be used to 
obtain a stopping rule v\ for subsystem 1 with asymptotic delay Dm' X2 (vi) given by Eq. (4). The false 
alarm of the procedure is bounded by a and the delay is independent of the shared information Z. If 
it was known a priori that Z only changed because of subsystem 1, we could include it in the SRP 
procedure to obtain a stopping time ui using Eq. (2) with the test ratio 



The resulting delay satisfies Dm' 2 (^i) < D m u 2 (v\), since more information can only help. In fact, 
this is the smallest delay possible for this problem. But clearly the distribution of the shared information 
depends on both change times and we need to propose a different strategy. 

The optimal single fault procedure uses the posterior probability of a change occurring conditional on 
the available observations. A natural extension of this procedure to the simultaneous fault problem is to 
use the posterior probability of change for subsystem 1 conditional on both X and Z. This probability can 
be used in the definition of the single fault procedure (Eq. (2)). The false alarm is guaranteed to be less 
than a. But Theorem 1 surprisingly shows this procedure has an asymptotic delay of at least D\ 1: \ 2 (i/i), 
the delay obtained in a optimal procedure that does not use the shared information Z. Therefore, it is 
not trivial to include shared information in a manner that reduces delay. 

Instead we propose a procedure based on the following observation: while neither subsystems have 
failed, the shared information Z is helpful in diagnosing both, and after failure it is only useful in 
diagnosing the first subsystem to fail. For subsystem 1, we initially test for its failure assuming subsystem 



A„(X, Z) 



P(Ai < n,\ 2 = oo\F n {X,Z)) 



(5) 



P(Ai >n,X 2 = oo\F n (X,Z))' 
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2 is not failed (i.e. A2 > n) using both private information X and shared information Z. Similarly we 
test subsystem 2. If subsystem 2 fails, we switch the test in subsystem 1 to a posterior probability test 
based only on its private information X. The proposed procedure is called stopping time with information 
exchange (STIE) and requires exchanging a single bit of information between subsystems so they can 
communicate their decisions. We denote it v\ for subsystem 1 and i?2 for subsystem 2. 

The first question is regarding the false alarm of STIE. Theorem 2 shows the false alarm for subsystem 
1 is bounded by the sum of a and the error coupling probability A {v%). The error coupling probability 
captures the probability of subsystem 1 being misled to believe its failed due to a truly failed subsystem 2 
taking excessively long to declare a failure. If the stopping times for both subsystems are asymptotically 
decoupled (i.e. the error coupling probability is smaller than a), then we can guarantee a false alarm of 
less than a for STIE. Theorem 3 shows this happens when certain natural relationships hold between 
the amount of private and shared information. The analysis uses large deviation comparisons of stopping 
times and is of independent interest. 

The remaining question is regarding the delay performance of STIE. Theorem 5 shows STIE achieves 
an improved asymptotic delay performance as a — > 



D^^i) = [Z^(^)(l - 5 a ) + Z^' A2 (Pi)<y (1 + o(l)) 



■iAi,A 



Ai ,A2 



where 



D 



Ai,A 2 



D 



Ai,A 2 



ft) 



log a I 



q(X) + q(Z) + d 1 
I log a\ 



_q( X ) + d i. 

and 5 a is a quantity strictly greater than and less than 1. This quantity reflects how much the shared 
information benefits subsystem 1 as opposed to subsystem 2. Notice Dm ,X2 (i>i) < Dm ,Xl Theorem 4 
then shows that under mild conditions this delay matches the best possible performance for any procedure 
in an appropriately defined set of procedures with joint false alarm a. These conditions are the same 
required for the error coupling probability to be asymptotically small. This surprising result shows that 
under mild conditions we can decouple the change behavior of the shared information, and obtain an 
asymptotic optimal procedure for multiple simultaneous interacting fault problem with private and shared 
information. The proposed solution sheds light into how to construct solutions for other interaction 
structures. 

We conclude the paper with various simulation studies that show the validity of the proposed analytic 
insights. To the best of our knowledge this is the first paper that studies a multiple simultaneous interacting 
fault problem with information sharing constraints that impose partial observability at each subsystem. 



B. Related work 

Various procedures have been proposed for single fault diagnosis with full observation [1]. The 
asymptotic performance of these procedures have been analyzed in various papers, under different 
performance criteria and settings [2], [11], [13], [20], [25]. 

Information constraints arise naturally in the context of sensor networks. In such systems it is desirable 
for procedures to only use information from geographically close sensors to limit communication costs 
and improve network lifetimes. Such constraint leads to a distributed processing requirement for single 
fault problems. Various authors [15], [16], [27], [28] analyze distributed versions of single change point 
problems, and derive an optimal rule for some cases. To the best of our knowledge, this is the first paper 
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that introduces a model with multiple interacting change points and diagnosis restricted by a partial 
observability condition, both constraints that are important in practice. 

In contrast, multiple simultaneous fault problems have been less studied. Bayesian sequential change 
diagnosis [5] studies a problem formulation where a single fault occurs but there are M causes for 
failure. The goal is to detect when the fault happens and what caused it. Complete observability of the 
information is assumed. Our proposed formulation in contrast imposes observation and fault interaction 
structures for multiple simultaneous faults creating a completely new class of problems that cannot be 
mapped into this framework. 

There is a sizable literature on sensor failure detection in the context of sensor networks [4] , including 
detection of failures in multiple sensors [9], [10]. Many heuristics based on practical requirements have 
been proposed [14], [17], [6], [26], but none have optimality guarantees nor the change point structure 
is properly explored. In contrast we propose an algorithm with performance guarantees using a novel 
change point formulation. In fact, our analysis in this paper applies to commonly used correlation tracking 
heuristics (e.g. [19]) and shows that without properly structured stopping times that exchange information, 
these sensor fault detection heuristics can perform very poorly. 

C. Paper organization 

The paper is organized as follows. Section II states the problem in more detail and establishes some 
basic notation. Section III investigates the delay of the natural extension procedure based on posterior 
probabilities. Section IV introduces STIE (Localized Stopping Time with Information Exchange) and 
analyzes its performance. It also calculates the best possible delay achievable by any procedure. Sec- 
tion V presents simulation examples. Section VII presents technical assumptions and proofs. Section VI 
concludes the paper with a discussion of the results and avenues of future work. 

Parts of this work have been presented in IPSN 2008 (Information Processing for Sensor Networks) 
and the 2nd International Workshop on Sequential Analysis. 

II. Problem statement and notation 

Consider the setup given by the communication graph in Figure 2(a) and fault graph in Figure 2(b). 
Two subsystems 1 and 2 fail at random times Ai and A2 respectively. Subsystem 1 observes the private in- 
formation sequence of random variables X = {X n , n > 1}. The distribution of this sequence experiences 
a change due to the change time Ai of subsystem 1. Using our earlier notation, fo(X) — ^ fi(X), where 
fo(X) and f\(X) are known densities specific to random variable X. Similarly subsystem 2 observes 
the private information sequence Y = {Y n ,n > 1} and its distribution follows fo(Y) — ^> fi(Y). Both 
subsystems observe the shared information as the random variable sequence Z = {Z n ,n > 1}. Its 
distribution changes according to fo(Z) mil L_^ fi(Z), i.e. 

Z n fo(Z), n < min(Ai, A 2 ), 

fi(Z), n > min(Ai, A 2 ). 

The sequence of random variables for X between time k and n is denoted as X*, and similarly for other 
random variables. The goal of multiple subsystem simultaneous fault detection is to construct stopping 
times v\ for subsystem 1, only using {X*, Z*} at time n, and i>2 for subsystem 2, only using {Y*, Z*} 
at time n, that detect whether \\ < n and A2 < n efficiently. Efficiency is measured according to the 
performance metrics in Section II-A, i.e., each stopping time achieves a small detection delay for a given 
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false alarm. The multiple interacting fault detection problem is difficult due to the interacting nature of 
the faults and the information sharing pattern imposed by the communication graph. Furthermore, these 
constraints make it hard, if not impossible, to find an optimal stopping time in the spirit of Shiryaev 
[23], i.e. that is non-asymptotic optimal. Therefore, it is natural to seek stopping times that can achieve 
asymptotic optimality. 

To conclude, we detail further the Bayesian formulation of the multiple interacting fault problem. 
The fault graph in Figure 2(b) details the probability dependency structure. Conditional on the change 
times Ai and A2, the random variables X, Y and Z are all independent. We also assume the joint prior 
distribution of the change times is denoted P(Ai = fci, A2 = £2) = ^i(ki)^2{k2)- For convenience, define 
the cumulative quantities Ii* = P(Ai > n) and = P(A2 > n). 

The a-field generated by a sequence such as X* is denoted by F x . For the fields of joint variables, we 
use notation such as T\ Y - Based on these definitions we can formalize the restriction that subsystem 1 
can only use use random variables X and Z for its decision, whereas subsystem 2 can only use random 
variables Y and Z for its decision, by requiring that the respective stopping rules be localized: 

Definition 1 (Localized stopping time). A localized stopping time for subsystem 1 is a stopping time 
v\ 6 J^x z- Similarly, a localized stopping time for subsystem 2 is a stopping time £ Fyz- 

The probability measure in the joint space of random variables when the change happens at Ai = k\ 
and A2 = A>2 is defined as: 

P fcl , fe2 (Xi,Yi,Zi) = P fcl (Xi)P fclAfc2 (Zi)P fc2 (Yi) 

fei— 1 n fciAfc 2 — 1 to k 2 —l n 

~ n /oto n Aw n f°w n ^ n n ho® 

i=l i=ki i=l i=k\/\k 2 t=l i=k 2 

= L fcl (X^)L fclAfc2 (Z^)L fc2 (Y^). 

We define (X* ) denotes the product of densities for X and similarly for other variables. From the 
definitions, when A 2 = 00 we have P fcli00 (X^ , , Z^) = P fel (X^ )P fcl (Z i ! l )P 0O (Y^). The appropriate 
marginalized measures are also defined, such as: 

n n 

P Al , A2 (Xi,YiX)= E E^M^P^X^YiX)- 

k t =l k 2 =l 

In our notation ^k u k 2 refers to expectations with respect to the measure Pfe li fc 2 (Xj l , Y„, Z^). It will 
be useful to define the log-likelihood ratio of sample i for random variable AQ and the accumulated 
log-likelihood: 

n{x) = iog (US)) ; Rkn{x) = p k ri{x) ' (6) 

Similar definitions hold for all random variables. We make assumptions about the expectations of the 
log-likelihoods under pre-change and post-change distributions. In particular, assume they are all finite 
(* denotes don't care): 

E 1 ,4n(X)}= [ f 1 (x)log^»(dx) = D(f 1 (X)\\f (X))=q 1 (X), 
J fo{x) 

where p, is the Lebesgue measure. Similarly, 

Eoo>i(*)] = j h{x)\og f j^»(dx) = -D(UX)\\h(X)) = -q (X). 
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For Y a similar assumption holds, only noting that expectations will be with respect to E^o and E* )0O . For 
Z, again the definitions hold, but expectations should be with respect to Eo,o and Eqq qq. The assumption 
is that qo(X),qi(X),qo(Z),qi(Z),qo(Y) and qi(Y) are all positive and finite. 
Further detailed technical assumptions are stated in Section VII-A. 

A. Performance metrics 

Denote the fault detection rule for subsystem u by stopping time v u for u = 1 and u = 2. In the 
change point literature, such a stopping time is evaluated according to two metrics: probability of false 
alarm and detection delay, see e.g., [25]. 

Definition 2 (Probability of false alarm). Given a stopping time v u and the change time X u the false 
alarm probability at \ u = k u is defined as 

Pja lM) ^u)=n i M^u<k u ). 

The false alarm probability for procedure v u is given by 

oo oo 

Pfa'^^u) = PaK < An) = Mh)ir2(k2)W klM (v u < K). 

k 1= l k 2 =l 

The marginal false alarm probabilities for procedures v\ and v 2 are 

MP^\v x ) = P Al ,*>i < Ai) and MP^ M \v 2 ) = ¥ klM (u 2 < A 2 ). 

The conditional marginal false alarm probabilities for procedures v\ and v 2 are 

oo h\—\ 

MPj a ^ 2 (y x \\ 2 < Ai) = ^(^i) Yl ^( k 2)^oo,kM < *i) and 

fe 1= i fcjj=i 

oo k 2 — 1 

MPj a u ^{u 2 \\i < A 2 ) = ^(fe) Ti(fci)Pfe,oo(^ < k 2 ). 

k 2 =l ki=X 

Definition 3 (Detection delay). The m-th moment of the delay of a sequential procedure v u for change 
time X u = k u is defined as 

d£ M) ("u) = E fcl , fc2 [(i/ u - k u ) m \u u > k u ] . 
The m-th moment of the detection delay is 

oo oo 

k 1= l fc 2 =l 

A good procedure achieves small (even minimum) delay Dm' 11 ' '(v u ), while maintaining P^ 1,7T2 (v u ) < 
a, for a pre-specified a. An optimal detection procedure for subsystem u is a procedure v u for which 
the delay D^ 1 ' 772 (u u ) is minimized while keeping the false alarm below a chosen probability a so that 
Pfa 1,7t2 {v u ) < a. Such a rule is called an optimal sequential procedure. Notice that a procedure that 
satisfies P^ 1 ' 772 {v u ) < a does not necessarily satisfy MPfa{v u ) < a, and in particular this is true for the 
optimal sequential procedure. 
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III. Localized fault detection without information exchange 

One approach to solving the multiple interacting fault detection problem is to use a methodology 
inspired by solving a single change point problem. We first review the relevant solution for a single 
change point problem and then we describe the natural extension which is shown to be unable to exploit 
the common information available for detection of both change points. 



A. Test statistic for a single change point 

Suppose a single subsystem fails at a random time A, with distribution IP (A = n) = 7Ti(n). The 
observations X are an identical independently distributed random variable, with distribution /q before 
change and /i after change. The fault detection formulation for this single subsystem is the standard 
single change point detection problem. 

Shiryaev [24] showed that an optimal sequential procedure is the procedure that tests the hypothesis 
Hi : A < n against Hq : A > n at each n, using the observations X\, ...,X n . The Shiryaev-Robert-Polak 
(SRP) sequential procedure is a threshold test on the posterior probability as shown in Eq. (1). The SRP 
test quantity can be further developed as 

n k n 

5>l(*)II/0(*r) n M X r) 
a / v n P(A < n\F%) k=0 r=l r=k+l . n -l\^ (U\ Rt(X) 

K{X)= l-P(A<n|^) = ^ » = A ° + n " P Mk)e ' 

k=n+l r=l 

where R*(X) is defined in Eq. (6), A = 7Ti(0)/(l-7ri(Q)) and il„ = P(A > n). This test quantity in the 
stopping time in Eq. (2) with threshold rule given by Eq. (3) to obtain the SRP procedure. Tartakovsky et al 
[25] showed the SRP procedure achieves the optimal asymptotic delay for the problem of minimizing the 
expected delay constrained to a false alarm probability a (i.e. P£(i>s) < a). Furthermore, the asymptotic 
delay as a — > is given by Eq. (4), which matches the lower bound for delays for any procedure with 
false alarm a. 

The single change point problem is considerably simpler than the multiple change problem, since once 
a change is detected, it is attributed to a unique fault, and there is no chance of confusion with other 
potentially failed subsystems. We summarize the important facts in the following definition: 

Definition 4 (Sequential test statistic). The generalization of the test for a SRP procedure using random 
variables X and Z is 

n 

K(X,Z)=A + U- 1 ^ 1 (k)e R ^ x ^ z \ (7) 

k=l 

This corresponds to the ratio in Eq. (5). The corresponding stopping time is v\ = v s (X, Z) and uses the 
threshold in Eq. (3). Similarly we can define A n (X), A n (Y, Z) and A n (Y), using 112, Y and Z. The 
corresponding stopping times are v\ = v s {X\ vi = v s (Y,Z) and V2 = v s iX)- 

Remark 1. For the rest of the paper, we assume without loss of generality that Aq = for the SRP 
procedure. 
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B. Test statistic for multiple interacting change points 

In this section we study the first natural approach to solving the multiple interacting fault detection 
problem. We focus on subsystem 1. Heuristically, a threshold test in the posterior probability seems a 
reasonable choice for stopping time. For the single change point case this is an optimal choice. In the 
modified framework, such a choice may not be optimal, but it is certainly an attractive and simple test. 
Intuitively, this is the first test one would consider. The posterior probability test can be written as: 

vi(X,Z) = mf{n:p n (X,Z)>l-a}, 

Pn (X,Z) = P Al ,A 2 (Ai<n| J™ z ). (8) 
To put into standard form, notice that 

p n (X, Z) I- a 



l-p n (X,Z) a 
is an equivalent test to the original. Then the test statistic is 

^M(*i<n\Fl z ) 



K ex (x,z) 



\,X 2 (M >n\ T\ Z Y 



From the problem definition, we can compute the probabilities involved in the statistic A™ ex (X, Z) 
a n /b n ■ 



n oo 



«n = 53 X/ 7r l( fc l) 7r 2(fc2)Lfc 1 (X^)Ljfc lA jfc a (Z^) ) 
k 1= l k 2 =l 

b n = n 1>n L n+1 (xi) I n 2 , n L n+1 (z* ) + ^ (fc 2 )u 2 (zi) 1 . 

I fe,=l J 

Similarly, we can define a stopping time based on A™ ex (Y, Z) for subsystem 2. The first important 
observation is that computing the test quantity is non-trivial. More importantly and somewhat surprisingly, 
no delay reduction benefit is obtained from using the shared information: 

Theorem 1. Assume qo(Z) > qi{Z) or ^2(^2) > for k<i > K2. For the posterior threshold test 
v\(X,Z) without information exchange given by Eq. (8), the delay satisfies (as a —> 0) 

z^>i(x,z)) > z^mx)). 

For the threshold test v<i (Y, Z), similar bounds apply. 

The result shows that the performance of the rule does not depend on the statistics of the shared 
information Z. This is a surprising result, since we expect an improvement in performance of the order 
of the KL divergence (q\(Z)) for the pre and post-change distributions of Z. 

Thus, in this procedure the shared information is not useful in determining which subsystem failed. 
The information in either pair (X, Z) or (Y, Z) by itself is not helpful in determining whether the change 
in Z is induced by a failure in subsystem 1 or in subsystem 2. In the hypothesis test, the null hypothesis 
as well as the alternative hypothesis incorporate the information that a change in the shared information 
could have happened because the other subsystem failed. 
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IV. STIE: A LOCALIZED STOPPING TIME WITH INFORMATION EXCHANGE 

In this Section we propose localized STIE (Stopping Time with Information Exchange), an interacting 
stopping time method that attempts to overcome the limitations discussed in the previous section and 
benefit from shared information. 

The structure of the interaction between faults leads to an observation: before either subsystem has 
failed, the shared information helps both decide whether they are failed or not; after one of them fails, 
the shared information is not useful for the non-failed one. STIE relies on this observation by initially 
computing a test statistic for subsystem 1 that assumes subsystem 2 is not failed. This is just the standard 
SRP procedure, that uses both X and Z and can be computed as shown in Definition 4. A similar test 
statistic is computed for subsystem 2. 

Based on these statistics, we can define the stopping rule v\ for subsystem 1: 

vx = min{n : A n (X, Z) > B a ] = v s {X, Z), (9) 

and similarly V2 for subsystem 2. Each subsystem computes this test, until one of them believes it is 
failed. Say subsystem 2 believes it is failed at time n (so V2 = n) and before subsystem 1 (i.e. v\ > n). 
In STIE, subsystem 2 communicates its decision to subsystem 1. Then subsystem 1 should not use the 
shared information anymore, else it may be misled to think it has failed due to the change in Z. From 
this point onwards, subsystem 1 computes the SRP posterior rule only based on its private information 
X and computes the stopping rule v\ until failure: 

u 1 =min{n:A n (X)>B a } = u s (X). (10) 

If instead subsystem 1 had declared failure first using v\, subsystem 2 would use an analogous stopping 
rule i>2- We can summarize formally STIE in terms of composite stopping rules v\ for subsystem 1 

v\ = v\I(vi < V2) + max {pi, v%) \{y\ > vi) (11) 

and i>2 for subsystem 2 

i>2 = V2^{v 2 < v\) + max (P 2 ,^i) 1(^2 > v\). (12) 

In the composite rule v\, the exchanged bit is represented by the indicator l{y\ > v%). The max operator 
reflects the situation the private information from a subsystem dictates that it has already failed (e.g., 
vi < V2 = n), in which case one should stop immediately at the present time (v2 = n). 

The proposed stopping rules can be implemented in the system in Figure 2(b) in a distributed way. 
In STIE there is an information exchange between subsystems, but it is constrained to a single bit that 
informs when a subsystem's statistic has crossed its threshold. Then the other subsystem stops using the 
shared information (that is, it recomputes its own statistics without using shared information). This is a 
new feature of the model investigated in this paper. Previous literature in distributed hypothesis testing 
focused in the case where all subsystems observed the same hypothesis. Here we have a problem where 
subsystems observe hypothesis that interact with each other. 

Another important benefit of STIE is that it can be computed efficiently, as each subsystem only 
computes two recursions following Definition 4, without requiring the storage of all observed values of 
the random variables X, Y, Z. In [21] we describe a detailed efficient implementation in an application 
setting. 

STIE is summarized as follows. Each subsystem computes posteriors as if the other subsystem is 
always working, until the time one of them declares itself as failed. Both subsystems at this point are 
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using shared information. When one subsystem is thought to have failed the other stops using the shared 
information, and recomputes the change point test using only its private information. 

In the remainder of the section we compute the false alarm probability and the detection delay for 
STIE. The detection with information exchange algorithm is interesting if we are able to show that for 
a given false alarm rate 0(a), it achieves expected delays smaller than if the shared information is not 
used. 

A. Performance analysis: false alarm 

From Definition 2, the false alarm for subsystem 1 is given by 

P^ 7T2 (v 1 )=P XlM (v 1 <\ 1 ). (13) 

Moreover, by design choice of the threshold for tests v\ and v\ that form STIE, the false alarm when 
there is no change observed in subsystem 2 (i.e. A2 = 00) is bounded: 

P f : i ' 00 (z, 1 )=P Al , 00 (z. 1 <Ai)<a, 
P f : i ' 00 (P 1 )=P Ali00 (z> 1 <A 1 )<«. 

Unfortunately these guarantees do not translate into a guarantee for the false alarm in Eq. (13) of the 
procedure composed of both tests. Analyzing Eqns. (11) and (12) we notice that subsystem 1 can raise 
two kinds of false alarm at some time n: one caused without any change (Ai > n and A2 > n), and 
another caused when the shared information experiences a change due to a fault in subsystem 2 (A2 < n). 
Based on this observation we define the error coupling probability: 

Definition 5 (Error decoupling probability). The error decoupling probabilities of stopping times in a 
set of procedures {p\, S 2 ) are defined as 

ZZ,xM) = ^W^i < *2, A 2 < i>i < Ai), 
CaW^) = Px u x 2 (v2 < Si, A! < v 2 < A 2 ). 
A regular fault detection procedure is a set of procedures for which the following conditions hold: 

lim \ ipx) = 0, 
lim , {p 2 ) = 0. 
A strong fault detection procedure is a set of procedures that has 

The importance of this definition is shown in Theorem 2: the false alarm for the composite procedure 
STIE can be shown to be bounded by the sum of the desired false alarm rate (a) and the error coupling 
probability. This probability measures the degree of coupling caused by the competing change time. If 
it is of order 0(a), we say the procedure is regular. As a comparison, if the event £ = \v\ < Ai} was 
contained in the union of the events £\ = {v\ < Ai,A2 = 00} and £2 = {v\ < Ai,A2 = 00}, then a 
direct union bound shows the false alarm to be bounded by 2a. 

Theorem 2 (False alarm of STIE). 
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(a) The probability of false alarm of subsystem 1 for the joint procedure with information exchange 
(STIE) is bounded by: 

(b) The marginal probability of false alarm of subsystem I for STIE is bounded by: 

The intuition behind Theorem 2 is that if the decision of not using Z was immediate, as soon as 
min(Ai,A2) happens, there would be no error coupling event and the composite procedure would have 
false alarm a. But due to delayed detection, there is a period of time when subsystem 1 can declare 
a fault due to change only in Z but not in X. This is exactly £ c = {A2 < v\ < \\,v\ < U2}, the 
error event coupling the tests between subsystems. If the asymptotic rate of £ with a is faster than O(a) 
(procedure is strong), the additional false alarm incurred is not significant, since delay is proportional 
to the logarithm of false alarm. Otherwise, the error incurred is significant, and reduces any potential 
delay benefits. Theorem 3 completes the understanding about the false alarm for procedure STIE by 
analyzing the error coupling probability and identifying under what conditions the procedure is strong, 
and therefore has false alarm rate of order a. 

Theorem 3 (Error coupling probability regularity). The theorem is stated for subsystem 1. For subsystem 
2 it suffices to exchange the role of X and Y. 

(a) The procedure STIE is a regular fault detection procedure. 

(b) Let assumptions VII. 2 and VII. 5 hold. Define b\ = qo(X) — qi{Z) + d\ and the rate 

* = 1 [mintopQ, qi {Z)} + qi (Y) + d 1 - d 2 } 2 
Ta w* max{a%(X),o-j(Z)}+o-f(Y) 

where 



™* = \i —r j;/;;;: ^ h^w,^^} + <&oo + * - ^ - h, 



af(X)+aj(Z) 
m^{a^X),al(Z)}+al(Yy 

constants erg(X), o~\(Zi) and o~f(Y) are defined in assumption VII. 2 and constants d\ and d 2 are 
defined in assumption VII. I . Then 

r lo gffx,A,fo) ^ * 
lim > r , 

a^O log a 

where 

(a) lfbi<0 then r* = r*; 

(b) If bi > then r* = max(r*,r t *), where 



b af(X) + al(ZY 
Therefore, if r* > 1, STIE is a strong fault detection procedure. 

The main element to proving this theorem is an identification of which types events cause strong 
error coupling. An important key result (Lemma 1) shows that for STIE, conditions on the amount 
of information provided by the different information sets suffice for the various types of errors to be 
decoupled. 
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Example. Let us consider a simple scenario where a 2 (X) = <j\(Z) = cr 2 (Y) = 1/2 and d\ = d 2 = e, 
where e is small. When the shared information is stronger than the private information (i.e., qo(X) < 
qi(Z)), and if qo(X) is small, then b\ < and r* m qi(Y) 2 /(qi(Y) + qi(Z)). So the private information 
of subsystem 2 needs to be large as well (i.e., qi(Y) = 0(y/qi(Z)) for the procedure to be strong. 
Intuitively, this will prevent subsystem 1 being misled by a fault in subsystem 2, since subsystem 2 
quickly detects its own fault. Otherwise, if (i.e., qo(X) » q\{Z)), then r* > r* b and r£ rj 4q (X), 
so the procedure is strong if sufficient private information is available to subsystem 1, independent of 
subsystem 2. If qo(X) is small in this case, then the procedure still benefits from the strength of private 
information of subsystem 2. 

In more general scenarios, the error coupling events can be inferred from the fault graph structure. 
Then, if the probability of error coupling is small, the inference problem for each subsystem can be 
decoupled, and thus behaves as multiple single fault problems. A procedure like STIE is then strong if 
both private and shared information (X, Y or Z) are relatively too strong. 

B. Performance analysis: detection delay 

The performance of individual procedures that compose STIE are known under the condition no change 
occurs in the competing subsystem. For example, for subsystem 1 if A2 = 00, it is clear that the standard 
delay computation in Eq. (4) applies to stopping rules v\ and v\ with appropriately chosen constants. 
We can define the detection delay constants for each individual change point that composes STIE: 

Definition 6 (Detection delays). Define the following detection delay constants: 

= [jogaj a = I log a I 

1 qi(X)+qi(Z) + di' 2 qi (Y) + qi (Z) + d 2 ' 

fa = |loga| fa = l lo g a l 

1 qi (X)+di> 2 qi (Y) + d 2 > 
where d\ is the rate for prior tt\ and d 2 is the rate for prior 112 according to Assumption VII. 1. 

Based on this definition and under the condition A2 = 00, notice that D^'* 2 (ui)=Lf and D Xl ' 2 (Pi)=Z c 
Furthermore, D^ 1 ' 2 (^i) < D^ 1,X2 (i>i). Lf is the smallest delay achievable in this problem as it assumes 
the shared information only changes due to Ai. is the delay achieved by v\ in the general scenario 
since it does not use the shared information. Thus, we expect D^ 1,X2 (ui) < D Xl ' X2 {9\) < D Xl,X2 (i>i) as 
when A2 happens much earlier than Ai, STIE will use v\ for subsystem 1. This might even be the case 
for any procedure respecting the false alarm bound. 

It is natural to start the analysis by determining an asymptotic lower bound for the detection delay 
of any procedure for multiple interacting fault detection. The minimization is constrained by the desire 
for the procedure to incur false alarm at most a. For this to hold we consider only procedures in an 
appropriate false alarm class: 

Definition 7 (False alarm classes). For stopping times u\{X, Z) dependent only on X and Z define the 
classes: 

(i) Ai(a) such that P^ ,0 °{u{) < a, 

(ii) A 1 (a,k 2 ) such that MPj a lM {u{) < a, 

(hi) Ai(a) such that MPl^ 2 {vi\\ 2 < Xi) < a. 

Also, define similar classes for stopping times v 2 {Y^ Z) dependent on Y and Z. 
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Using the definition we can prove a performance lower bound for our problem among certain classes 
of procedures as shown in Theorem 4. The lower bound guarantees that no procedure that belongs in 
the given class can have delay smaller than stated. It gives us a certificate against which to check the 
optimality of a given procedure. 

Theorem 4 (Delay lower bound). Let Assumptions VII. 1 and VII. 3. Denote Cd = (l + o(l)) and consider 
the classes in Definition 7. Then for subsystem I as a —> 0: 



inf E felife2 [(^ - hY^u, > h] > (Lf) m I(h < k 2 ) + (L?) m I(h > k 2 ) 
inf E AliA2 [{v x - Ai)">i > Ai] > [(L?) m P(Ai < A 2 ) + (L?) m P(Ai > A 2 } 

ui6Ai(a)nAi(a) L 



Cd, 
Cd- 



A similar result holds for subsystem 2. 

The lower bound can be intuitively understood since when A 2 < Ai, the shared information does not 
help in identifying the change in subsystem 1 for arbitrarily small false alarm a. Notice the procedure v\ 
in STIE may not belong to Ai(a), since the bound for the false alarm rate is greater than a and more 
importantly they depend on all three X, Z and Y by definition. But v\ and v\ do belong to Ai(a) and 
V2 and i>2 belong to A 2 (a). 

We conclude the section computing the asymptotic delay of the procedure STIE. The main challenge 
in this analysis is to account for the various possible combinations of change times Ai and A 2 generating 
different choices in the composite procedure STIE. Theorem 5 computes the detection delay of STIE 
under this general setup. 

Theorem 5 (Performance of STIE). Let Assumptions VII. 1 and VII.4. Consider the procedure STIE 
represented as the set of stopping times (pi, i>2). The delay of STIE as a — > is given by: 

z^'^i) = K i,o >i)(i - s a ) + £>£'°°(*i)<y (i + o(i)), 

where 5 a = F\ li x 2 (vi > ^ 2 ), Dm ,a ° {vi)=(Lf ) m , D^ ri '°°(i>i)=(L") m , and d\ is given in assump- 
tion VII. 1. The results are also valid for Ai and A 2 replaced by ki and fe 2 . For subsystem 2, analogous 
results apply. 

The proof of the theorem relies on careful use of concentration arguments and the fact STIE is a 
regular procedure. Notice that the asymptotic performance differs from the lower bound only on the 
factor <5 Q . 



Remark 2. It is easy to see that in a symmetric problem (i.e. qi(X) = qi(Y) and tt\ = 7r 2 J, 5 a = 
P(Ai < A 2 ) = 1/2, and therefore the proposed procedure achieves the delay lower bound, albeit with a 
potentially larger false alarm. In fact, under the conditions the procedure STIE is strong, it is actually 
an optimal asymptotic procedure for the symmetric problem, since the false alarm is bounded by a. We 
conjecture this is not the case for more general scenarios, and 5 a perhaps will depend on the difference 
between the delays of v\ and v 2 . 



V. Examples 

We evaluate the performance of our algorithm in simulations, which allows us to precisely specify the 
moment of failure. For the system in Figure 2 assume that fo(X) ~ J\f(p,(X), a 2 (X)) and fi(X) ~ 
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Fig. 3. Simulation example: (a)Sample path for correlation with change point at n = 50, (b) Error coupling probability 
estimates for different variance ratios and (c) Error coupling probability exponent estimates. Average delay comparison for 
false alarm a — 10~ r between (d) theory and simulation; (e) simulation including and excluding shared information Z and 
(f) theory including and excluding shared information. Uncertainty ratio in these figures refers to the quantity a 2 z /ag, where 

„2 2 2 

a x = °Y = o> 



A/"(0, (J 2 (X)). Similarly, we make definitions for Y and Z. In this case, the information strength for the 
private information X is given by 

qi (X) = q (X) = f^-, 

and similarly for the other information sets. Using the results obtained in Section IV-A, we can conclude 
that STIE is a strong fault detection procedure if 

qo(X)- qi (Z) qo (Y)- qi (Z) 
4 2a 2 (X,Z) >X ^ 4 2ai(Y,Z) >h 

whenever qi(X) > q\{Z) and qi(Y) > qi(Z). Let us assume fi(X) = fi(Y) = n{Z) = 1 to normalize 
the simulation variables. a 2 (X,Z) is the variance of the log-likelihood under the after change measure 
for Z and pre-change measure for X, which can be computed as 

a\X,Z) = ^—+ ' 



a 2 {X) a 2 {zy 

obtaining the conditions 

a 2 {X)< l -a 2 (Z) and a 2 (Y) < \a 2 {Z). 

The result can be interpreted intuitively if we consider that private information X focuses in capturing 
the behavior of Ai change time for subsystem 1 and similarly for Y and subsystem 2. This information 
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sets are not coupling Ai and A2. Thus, the condition implies that the information strength of these sources 
has to be at least three times the information strength of the shared information to avoid the coupling 
probability becoming too large. 

For the numerical simulation, we further assume that random sequences X and Y are i.i.d. with 
variance a 2 s . The shared information Z has a fixed variance a 2 z = 1. The priors for Ai and A2 are 
exponential distributions with rate d\ = cfe = — log(O.Ol). Figure 3(a) shows a typical sample path of 
private information when <r| = 0.2. Notice that without time averaging it is very hard to say exactly 
when the change (failure) occurred. 

In Section IV-A we argued that the error coupling probability should go to zero as the false alarm 
rate a — ^ for the procedure to be consistent, and we see this in Figure 3(b). Notice though that the 
rate depends on the uncertainty in private information variance <r|. From Figure 3(c), if o"|/cr| < 1.8, 
the error coupling probability is 0(a p ) with p < 1, so the total false alarm rate of the procedure grows 
slower than a. But for higher ratios, our procedure has false alarm rate a since the false alarm is the 
sum of a and the error coupling probability. To achieve higher ratios it is valuable to increase the private 
information strength, i.e., the strength of information that responds to a single fault. The theoretical 
prediction guarantees that the procedure is strong for ratio o~ 2 z /a 2 s > 3. 

Figure 3(d) shows the theoretical and experimental average delays obtained when the threshold is 
a = 10~ 7 . There is disagreement between the curves, although the qualitative behavior is as expected. 
The disagreement is because our results are for a — > 0. This discrepancy is well known in sequential 
analysis [25]. Figure 3(e) compares the behavior of our procedure using the shared information Z and one 
that does not use it at all. There is a substantial reduction in delay using shared information. Figure 3(f) is 
the corresponding theoretical prediction. There is a qualitative agreement between theory and simulation 
experiment. 

VI. Discussion 

In this paper we developed a procedure for the multiple interacting fault detection problem. We proposed 
a set of basic assumptions and a framework based on the notion of a fault graph together with fundamental 
metrics to evaluate the performance of any sequential fault detection procedure. Then we proceeded to 
analyze the efficient algorithm STIE that achieves a good performance under the proposed metrics, and 
even an optimal performance under certain scenarios. As far as we know, this is the first derivation of 
bounds on detection delay subject to false alarm constraints in a multiple fault or multiple change point 
setting. 

One of the main contributions of the paper is to develop a model that includes simultaneous change 
points that interact to generate changes in the observations. Such interactive aspect is novel and leads 
to a rich set of models that extend single change point modeling. Furthermore, the constraint in the 
information exchanged between the various tests leads to sequential tests for simultaneous hypothesis 
that use inconsistent views of the probability distributions. 

The proposed statistical model and algorithms introduce many new ingredients into the detection 
literature. Due to the simultaneous and interacting change points, we develop careful asymptotic stopping 
time comparison calculations. Moreover, the proposed stopping times are allowed to exchange information 
via a network, and influence each other's behavior. This information sharing introduces coupling of the 
false alarm error between the procedures, and we contend that networked procedures will work well when 
the coupling event has a small probability. The main advantage of following such approach is that the 
analysis of the decoupled problem can benefit from the many tools developed for single change points, 
and care must be taken for the analysis in the coupled regime. 
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In the context of detection of faults in sensor networks, our algorithm performs an implicit averaging of 
the history of observations reducing the detection delay for a fixed false alarm. Many proposed practical 
methods in the literature do not perform this averaging, and therefore are subject to longer delays. Our 
algorithm and framework are general enough that even model based methods for computing scores, such 
as the one proposed in [26] or the primitive in [8], can benefit from the proposed procedure. Compared 
to procedures such as in [6], [17], our method benefits from implicit averaging, whereas those methods 
make sequential decisions based on only the current observation. 

Important questions for future work are proposing and analyzing an algorithm for a general fault 
graphs and general communication graphs. The current framework seems to naturally lead to problems 
of detecting functionals of various change points observing variables whose distributions depend on these 
functionals. Moreover, it will also be interesting to analyze the behavior of STIE in more general graph 
settings. We have successfully applied the algorithm in practice [21] to more general instances of the 
multiple interacting fault detection problem. 



VII. Proofs 

A. Technical Assumptions 

Some technical assumptions are required in order to obtain performance estimates for the procedures 
proposed. The first assumption is that priors have tail bounds. 

Assumption VII.l. The priors iri and 112 of subsystems 1 and 2 satisfy the tail limit: 

lim Eki = _ du 
n 2 

lim — f — = -d 2 . 
The next assumption is on the tails of the log-likelihood random variables. 

Assumption VII.2. Assume log likelihood ratios are independent and have finite first and second moment. 
Denote the variance of the likelihood ratio of X under /o by ctq(X) and under f\ by a 2 (X), of Y by 
o~q(Y) and o~\(Y) and of Z by o~q(Z) and a\(Z). For concreteness, consider the likelihood ratio for X, 
R r n {X). Then we assume the following tail bounds exist for x > [f n {X), 

P klM (R' n (X) >x)< K(X) exp- (x ~/g^ ))2 

where 

fi r n {X) = (n - h V r)q x {X) - {h - k x A r)q (X), 
a r n (X) 2 = j(X){(n - h V r)aj(X) + (h - h A r)a^(X)}. 

Similar bounds hold for Y and Z, with /U and a appropriately defined. Also, we assume the bounds 
for sums, such as R r n {X) + R^Z), by again using the appropriate definitions, such as [f n {X,Z) = 
^n(X) +pI n (Z) and o~ r n (X, Z) 2 = a^(X) -j-a^Z). The constants for the bounds are defined as K(X, Y) 
and "f(X, Z). 

Remark 3. The tail bound assumption is not overly restrictive. In fact, it only imposes a light tail 
constraint on the individual likelihood random variables, and then uses independence. For example, if /o 
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and fi are Gaussian densities, the tail bounds can be obtained from large deviations. If all log likelihood 
ratios are bounded within interval [— M, M], then using Hoeffding's bound [7], we can obtain a similar 
bound for each random variable, except that in this case j(X) = 2 and 

a r n (X) 2 = 2{(n - k! V r)af (X) + (h - h A r)a 2 (X) + M/3}. 

The rationale behind these assumptions is that it allows precise computation of the probability of 
deviations of the likelihood ratio sequence, including when the maximum crosses a threshold. 

We then assume different forms of expectation concentration of the log-likelihood[12], [25]. 

Assumption VII.3. For all e > and ki, ki > 1, as N — )■ oo: 



fei,fc 2 



o, 

• 0. 



Assumption VII.4 (r-quick convergence of LLR). The log-likelihood ratios R k k \ +n _ 1 (X), R^Akl+n-l^ 
and R^ 2 2+n _i(y) define the stopping times: 

T (ki,k 2 )( X ) = sup|n>l: 



-it* 



T^M)^) = sup|n>l: 
T^\Z) = sup|n>l: 



fci+ra- 



-R 



ki 



> e 



-R 



fciAfc 2 



n 



kl Ak 2 +n-l( Z ) ~ H( Z ) 



> e 



For all e > and k\ > 1 and ki > 1, for some r > 1: 



%,fc 2 



< oo, E 



r/ fcl "* 3 )(X)l < oo, E AliA2 \t^\y) 



< oo, E fel)fc2 

< oo, E Ai ,a 2 



< oo, 

< OO. 



Assumption VII.5. Let 



St PO : = log g4M + /ft (X) + /ft (Z), 

5 ™ 2(y) : = log S§y + jR " 2(y) + i? " 2( ^ ) - 

Lef 771 = min{n : S kl (X) > logi^} (where B a is given by Eq. (3)), and define for arbitrary e > 0, 

T e fcl = sup{n : |(n - A* + l) -1 ^*) - + qi (Z) + d a )| > e}. 

Assume that Eqo&j expT e fcl < 00 for any e > and for any k\. Similarly, let rj2 = min{n : S^ 2 (y) > 
logi? a }, anc? define for arbitrary e > 0, 

= sup{n : |(n- + l)" 1 ^^) - (<7iCO + q x {Z) + d 2 )| > e}. 

Assume that E^ fe 2 expTf 2 < 00 for any e > and for any ki- 
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B. Proof of Theorem 1 

We start the proof by defining an upper bound to the test statistic A" oex (X, Z) that defines the stopping 
time v\{X,Z). Selecting ^ = ^1 A k^, using the assumption ^2(^2) > 0, we can lower bound: 

6n>n lin vr 2 (fc2)L n+1 (Xi)L S2 (Zi), 

so that simple algebra shows 
Now we can proceed as 

iogAr x (x,z) = iog^<-io g n 1 , n -iog^fe)+iog e E^^^ t fc fXM 



-log^fe) -io g n lin + io g V ti(*i) T Ul( ^"i 



r + logA„(X), 

where the original test statistic can be upper bounded by the sum of the standard Shyryaev test for X 
with change point at Ai and a positive constant r. Define the stopping time r\ 

7/ = inf i n : log A n (X) + r > log 1 — - 1 < vAX) = inf j n : log A n (X) > log 1 — - 
[ q J [ a 

It is simple to see that rj — v\{X) — > as a — > 0. Since v\(X, Z) > rj w.p.l, we have shown 
D\ lM (y x {X,Z)) > D^iu^X)). For D^'fa^X, Z)) > D^ 2 (MX)), we have the following 
chain of inequalities using the definition of the delays: 

E fe k2 kmx, z) - kl r\Mx, z) > kl] = 

>E fclife2 [[(^(x.zj-fcij+p] 
>E fclife2 [[(17 - fci)+r] 

> (L aie ) m F kltk2 { V > fci + L Q)6 ) 

> (L Q , e ) m P fel , fc >i(X) > fci + L a>e ). 

where L aj£ = (1 — e) ~^°x)+d an< ^ m t ' ie ^ ast une we usec ^ tne Markov inequality. Lemma 4(i) also states 
F\ li \ 2 (vi(X) > Ai + L a ,e) 1- Noticing this is just the expectation of the last inequality, we conclude 
the proof. 
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C. Proof of Theorem 2 

(a) First we show item (a) for subsystem 1. The analysis is analogous for subsystem 2. 



PAx,A a (*i < Ai) =Pa 1 ,a 2 (*i < Ai,^i > P 2 )+Pa 1 ,a 2 (^i < Ai,^ < v 2 ) (14) 

= P AliA2 (max(Pi,i/ 2 ) < Ai,i/i > v 2 ) + P Al ,A 2 (^i < h,vi < v 2 ) (15) 

< P Ai ,a 2 (*i < Ai,n > ^2) +Pa 15 a 2 (^i < Ai,^ < Pjj) (16) 

< aP Al , A >i > ^ 2 ) +P Ai ,a 2 (^i < Ai,Pi < v 2 ) (17) 



= aP Al ,A>i > v 2 ) +P Al)A2 (i/i < Ai,i/i < A 2 ,z/i < j/ 2 ) +Pa 15 a 2 (^i < Ai,n > A 2 ,i/i < 

(18) 

= aP Al)A2 (z/i > i/ 2 ) +Pa 1 ,a 2 (^i < Ai,i/! < A 2 ,i/i < u 2 ) + P Ai ,a 2 (A 2 < i/i < A x ,Pi < P 2 ) 

(19) 

<« + a,A 2 (^i)- (20) 
In lines (15) and (18) we use the following observations from the definitions of i>\ and T> 2 . 

{v\ > v 2 } Pi {T>\ < x} = {v\ > u 2 } n {max(z>i, v 2 ) < x} , 
{i>i < v 2 } n {Ui < x} = {ui < v 2 } n {u\ < x} . 

In line (16) we used the fact that v\ = us(X) so (a) due to this definition P Al *(Ai < n|X|) > 1 — a for 
t > v\ (see Eq. (1)) and (b) by conditioning on data X:):, where r = max(z>L,z/i) the following bound 
holds: 

PAx.A.fa < Ai,^i > 1/2) = E[P AliA2 (z>! < Ai,^ > v 2 \& T )\ 

= EpP AliAa (& l <A 1 |X^)I(^>^)] 
< aP Ali A>i > V 2 ). 

For line (20) a similar argument applies since (a) P Al)0 o(Ai < n\X\,7i\) > 1 — a for t > v\ and (b) 
Pax,a>i < Ai,^i < A 2 ,i/i < i/ 2 ) =E[P Al)0O (i/! < Ai,W < A 2 |Xi 2 ,Zijl(i/i < i/ 2 )]. 

Proceeding in a similar fashion we can obtain the result for the false alarm of subsystem 2. 

(b) Now we can show (b) for subsystem 1. From the definition of marginal probability of false alarm 
in Definition 2 and following the proof steps in Eqns (15,16,18): 

PA l5 fc 2 (^i < Ai) < P Al ,fc 2 (^i < Ai,z/i > u 2 ) +P Aljfc2 (z/i < \ x ,vx < v 2 ) 
< aP Xuk2 (vi > vz) +PAx,fc 2 (^i < Ai,z/i < v 2 ). 
The second quantity can be bound: 

Pai^C^i < Ai,Pi < i/ 2 ) = P Alii t 2 (i/i < i/ 2 ,z/i < fci) 

= P Al ,jfc 2 (^i < Ai,i/i < fc 2 ,i/i < v 2 ) +PAx,fc 2 (A; 2 < Pi < fci,i/i < v 2 ) 
<aPA 1)fe2 (^i<i/ 2 ) + ax,fc 2 (^)- 
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D. Proof of Theorem 3 



£a 15 a 2 (^i) = Yj 7r ( fe i) 7r ( fc 2)Ffe 1 ,fc 2 (^i < V2,k 2 < vi < fa) 

k 1 ,k 2 

= 22 7r (^i) 7r (^2)IPfc 1 ,fc 2 (^i < v 2 ,k 2 < f 1 < fa) 

ki,k 2 

= 7r(A;i)7r(A;2)IPoo,fc 2 (i / i < v 2 ,k 2 < v\ < fa) 

ki,k 2 

< J27r(fa)7r(k 2 )fooM( k 2<^<^2) 

ki,k 2 

= 5Z7r(fc 2 )Poo,fc 2 (A:2 < V\ < v-i). 

k 2 

We continue the proof using Lemma 1. Given this lemma, it is immediate by the dominated convergence 
theorem that as a — > 0: 

showing that the procedure is regular proving (a) without Assumption VII.5 . Including Assumption VII.5, 
(b) follows since Poo,A 2 (&2 < i'l < v 2 ) = >~2k 7T (fa)F 00 ^ 2 (k 2 < v\ < v 2 ). A similar proof can be shown 
for subsystem 2. 

E. Lemma 1 (Event Decoupling Lemma) 

Lemma 1. Let Assumption VII. 2. For any k 2 > 0, the following bound holds: 

logPoc.^02 < V\ < V 2 ) # 

lim ■ — > r . 

a->o log a 

Let Assumption VII. 2 and VII.5. Then: 

logPoo A^fo < V X < V 2 ) „ 

lim > r . 

a^o log a 

Proof: The proof has five parts. In the first part we decompose the probability into three tail events 
that determine the a-order of the error coupling probability. The point at which we switch between the 
first two events is a parameter (C a ) that needs to be optimized. For each event we compute upper bounds 
to the probabilities and the rate function for the speed with which the error coupling probability converges 
to zero as ct — y 0. Using rate matching, we optimize the free parameter C a . Finally, we determine the 
parameter (C Q ), that is when one switches from the second to the third event, based on the choice of 
optimized parameter. 

Decomposing the event decoupling lemma into 3 events. First notice that (we consider C a = oo a 
valid possibility): 

Poc,fc 2 (&2 < V\ < V2) < Poo,fe 2 (fc2 < V X < U 2 , U 2 < k 2 + C a ) + Poo,fc 2 (^2 > fa + C a ). 

We decompose further the quantity: 

/k 2 +C a \ 

Poo,fe 2 (A:2 <v\ < V2,V2< fa + C a ) <Poo,fc 2 |J {HX,Z)>B a }n{Ai(Y,Z)<B a } , 

V l=k 2 J 
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where the bound follows from the definition of v\ and f 2 . The advantage of this particular bound is that 
for small I, the first event - subsystem 1 mistakenly crossing the threshold -of the intersection has small 
probability, and for large I the second does - subsystem 2 not crossing the threshold before subsystem 
1. From definition of the test quantities (Definition 4), we obtain the bounds: 

logA n (X,Z) < -l og n 1 (n)+ max {R r n (X) + R' n (Z)} . 

re[l,n] 

Now we can continue to bound: 

k 2 +C a 

Poo,k 2 ( k 2 < "I < V2, V2<k 2 + C a ) < p oo,fe 2 ({A/(X, Z) > B a } n {Ai(Y, Z) < B a }) 

l=k 2 

k 2 +C a 

< Y\ Poo^,({-logni(Z) + max {R r l (X) + R r l (Z)} > lo gj B a }n 

l=k 2 ' re[1 ' l] 

{-lo g n 2 (/) + logvr 2 (r) + R\(Y) + R\(Z) < logB a , Vr < I}) 

k 2 +C a 

< V F 00tk2 {{-\ogU 1 (l)+max{R r l (X) + R r l (Z)}>\ogB a }n 

tC 2 re[1 ' l] 

{- to g n x (Z) + lo g n 2 (0 - lo g 7r 2 (A; 2 ) + max{i?[(X) + R\{Z)} - R^(Y) - R^(Z) > e}) 

re[l,l] 

k 2 +C a 



< V Poo,fe [m a x{R r l (X) + R r l (Z)}>\ogB a + \ogU 1 (l)) + 

Xf Poo,*, (ws* W(X) + R\{Z)} - Rf 2 (Y) - R\\Z) > v) , 



l=k L 

k 2 +C, 



l=k 2 +C a 

where = e + log IIi (Z) - lo g n 2 (/) + log7r 2 (/c 2 ). 

Analyzing the probability of early crossing for subsystem 1 (event £\). Lemma 2 will be used to 
bound the first probability in the inequality. Define bo = qo(Z) + qo(X) and b\ = qo(X) — qi(Z) + d\. 
Apply Assumption VII.2 and Lemma 2, with a = a\ = \ogB a + log IIi(Z) — (Z + l)d\, b = b\, 
c = Cl = (k 2 - 1) a%(X, Z) and d = d x = af(X, Z): 



Vt, max { Rr i( x ) + R r i(z)} > logfla + iogn^z) 

\re[l,l] 



< I max P Mife {R\{X) + R\(Z) > \ogB a + loglli(Z)) 

r€[l,l] 



< I max max K(X, Z) exp 



(log B a + log IIi (Z) - (Z + 1) <h + (Z - r + 1) bi + s bp 



,2 



se[o,fc 2 -i]re[fc 2 ,i] 1 (Z - r + 1) af(X, Z) + sa5(A", Z) j 

/ (log B a + log Hi(Z) -(Z + l)rf 1 + (Z-r + l)6 1 ) 2 \ 

< l K(X. Z) max exp < ; : — ^ ; r — ^ : > . (21) 

" V ' \e[k 2 ,l] l \ (l-r + l)a 2 1 {X,Z) + {k 2 -l)a^{X,Z) j 

Let us assume that C a = log B a /w for some constant w > 0. We can then control the bound using 
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Eq. (21) and a simple observation: 

k 2 +C, 



V Poo,fc 2 (max{R r l (X)+R r l (Z)}>logB a + logIl 1 (l)] < (C a ) 2 exp- mm <Z> a (l), 
and ^q, is given by 

(logB a + A c (K a )+K a h) 2 
a K a al{X,Z) + {k 2 -l)al{X,Zy 

A c {K a ) = logIIi(K Q + k 2 ) - (K a + k 2 ) dx. 

The constant K a is chosen as to minimize $ Q under the constraint that < K a < C a . By assumption on 
tail of prior, there exists T, such that for all K a > T, \A c (K a )\ < e. We are in this regime. Consider the 
case b% > 0. Our previous calculation shows minima is achieved when K a = (logB a — e)/b\ — 2 (k 2 — 
1) (7q(X, Z)/af (X, Z). For vanishing a, K a < C a if b\ > w, else we should set K a = \ogB a /w to 
minimize <E> Q . Lemma 2 can be used to compute the rate at the minimum when either b\ > w or b\ < w: 

, , bj f logB a -e (k 2 -l)a 2 (X, Z) \ 

= 4 ^Z) \~~bi o*(X,Z) ' f ° r h > 

(log^[l + ^]-^) 2 



v 7 , for bi < w. 

\ogB a ^§^ + (k 2 -l)a 2 (X,Z) 

The rate that the probability goes to zero is then calculated as: 

-log[(C Q ) 2 e X p-<P a ] / 4^^j for h>w, 

n tin = lim = < r l n2 (22) 

1 ' «~° l °sB a \ ^[1 + ^] 2 for h< W . 

We can proceed similarly for the case bi < 0. Notice that to obtain a vanishing probability now, we need 
K a < log .Bo,/ — b\, so the only interesting case is when w > —b\ (else Q a = is the minimum). Since 
for b\ < 0, the function first decreases to the minimum, we can conclude that in this case: 

bi 



2,-— -.. tl2 



/ \ ,. - log[(C a ) 2 exp w 
r\[w) = hm — 



1 + 

w 



(23) 



a^O logB a vf{X,Z) 

Analyzing the probability of subsystem 2 crossing after subsystem 1 (event 8 2 ). Let V/ = e + 
logni(0-dii-logn 2 (0 + d2i + log7r 2 (A!2), q y {l) = {I - k 2 + l)q X {Y) and a 2 {l) = (l-k 2 + l)a 2 {Y). 
Similarly, for the second probability, we bound: 

P^ (maxli^X) + R\{Z)} - Bt>(Y) - R^(Z) > 

\re[l,l] J 

< I max Poo,fe 2 (Ri{X) + R\{Z) - R**(Y) - R^(Z) > V t ) 

<1 \ (Vi + (l-r + l)q (X) + q y (l) + [k 2 - r] + q (Z) + [r - k 2 ] +qi (Z)f \ 

" ' ' X|> ' (l~r + l)a 2 (X) + o*{l) + [k 2 - r] + el{Z) + [r - k 2 ] + o 2 {Z) ] 

( (Vi + (l- r)q (X) + (r - k 2 ) qi (Z) + q y (l) + q (X)) 2 \ 
(I - r) o*(X) + raf(Z) + o*(l) + fc 2f x 2 (Z) + a 2 (X) J 

/ (A e (l)+l[q l *+q 1 (Y) + d 1 -d 2 ]) 2 \ 



r 



•e[i,J] 



< Z max exp 
re[i,i] ' 



December 7, 2010 



DRAFT 



25 



where g f . = mm( fo (I), qi{Z)), a% = max(ag(X), <7?(Z)), 4 e (Z) = Vi - [gi CO + gi (Z)] + g (X) + 
qi (Y) and C7 e = k 2 [a 2 (Z) - a 2 (Y)] + a 2 (X) + af(y). 

To continue the analysis, we compute the rates for the second major event: 

k 2 +C a 



=fc 2 +C Q 



max{i?[(X) - i?f 2 (y) - i?f 2 (Z) > VJ < (C Q - C a ) z exp - min $ a , where 

re[l,i] 



A e (K Q ) + tf a [ ft . + gi(y) + d x 
C e + K a [a 2 ,+a 2 (Y)} 



do 



Lemma 2 implies the minimum in this case is at K a = A e (l)) / (gj» + gi(y) + di — d 2 ). But since this is 
a small quantity compared to C a + k 2 , assuming (gj» + qi(Y) + d\ — d 2 ) > 0, we have that the minimum 
happens at K a = k 2 + C a , as the function is increasing after the minima. Using similar arguments as 
for the first major event, it is straightforward to show that the rate function satisfies: 



r 2 (w) 



lim 



log[(C a - C a ) 2 exp -$ a ] _ 1 + gi (Y) +di- d 2 f 



\ogB a 



w 



rt+^(Y) 



(24) 



Selecting the optimizing rate. Given the bounds we have computed, the problem reduces to selecting 
the constant C a so that the best rate is obtained for Poo,k 2 {k 2 < v\ < u 2 ). In rate matching, we have two 
rates r\[w) and r 2 {w), and would like to maximize the minimum of both, i.e., m&yLmm(ri(w),r 2 (w)), 
which is obtained by setting w such that r\{w) = r 2 {w), where the rate functions are given by Eq. (22), 
Eq. (23) (here we denote it r\(w)) and Eq. (24). There are three cases, since the first event has three 
behaviors for the rate r*\ 

(1) Consider b\ > 0. Then, for w < bi, in order to have 

r2H>4 ^b? 



we set: 



■ n *1{X, Z) 

Wi<mm[bu ou r afon 



+ qx{Y)+dx -d 2 f 



46i 



bi 



and get rate r* = ^^ UQry 
(2) Again let b\ > 0. Then for w > bi, in order to have 



r 2 (w) 



a((X,Z) 



1 2 



1 + 



W 



we set: 



1 <y\{x,z) 

°l+<r\{Y) 



[qi*+qi(Y) + d 1 -d 2 ]-b 1 , 



as long as it satisfies ui| > b\. The obtained rate is r* = r 2 {w 2 ). Else, set w 2 = b\, and obtain rate 

r 2 (bx). 

(3) Let b\ < 0. Then for w > —b\, in order to have 

1 1 2 



r 2 (w) 



w 

°\{X,Z) 



1 + 



w 
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we set: 



Wo 



aj(X,Z) 



■[#• + qi(Y) + di - <h] ~ h, 



which satisfies w^ > —b\. The obtained rate is r* = r 2 (w^). 

Upper bounding detection of subsystem 2 and selecting C a (probability of event £3). We bound 
IP'oo.ifca (^2 > k 2 + C a ). Let Assumption VII.5 and C a = (3 \ogB a . From definition of the test quantity 
(Definition 4): 

logA n (Y,Z) > -logU 2 (n) + log7r 2 (r) + R r n (Y) + R r n (Z). 
Let j] = mm{n : R k k l +n _ x {Y) + i^+n-iC- 2 ) > logB a }, so u 2 < rj. For arbitrary e > 0, let 

T*> = sup{n : K^X^n-lCO + F^^_ X {Z)\ - ( Ql (Y) + q x {Z) + d 2 )\ > e}. 
It is simple to see that: 

logB a > R k l x (Y) + R k U{Z) >(v~ k 2 )(qi(Y) + q x {Z) + d 2 - e) on {q - 1 > T**}. 

So, 

log £ Q 



^2 < f? < &2 + 



Using this result: 



gi(y) + gi(Z)+d-e 



I(t? < 1 + Tf) + (1 + Tf)I(r/ > 1 + T* 



< &2 + 



log£ 



3,fc 2 (^2 > fc 2 + C a ) < Poo,fc 2 (C a < 



gi(y) + ffi(Z) + d-e 



+ 1 + T fe2 . 



gi(y)+ ft (Z) + d-e 



< 



O0,fco 



T £ fc2 + l>log5 Q /3 



ft(y) + gi(Z)+d-e 



< Eocfe exp(T £ fc2 + 1 



a 



/3- 



91<1') + 9l(Z) + <i2- 



1 — a 

< \ n(Y)+q 1 (z)+d 2 - t 



where we used Markov's inequality in the last line. Assumption VII.5 guarantees that E 00 / i C2 exp(T e fc2 + 
1) < 00. The constants in big-0 are independent of k 2 ,e. To obtain the best possible rate for the total 
error coupling probability, we choose 

/3 = (l + e)r* + ) 

qi(Y) + qi(Z) + d 2 - e 

Concluding the proof. To put the elements of the proof together, we use the bound: 

Poo,fe(*2 < VI < Vl) < Poo,fe 2 (^l) +Poo,fc 2 (^2) +Poo,ifc 2 (^3), 
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so the rate function has 



- log Poo,k 2 (k 2 <v\<wi) -log 3 maxi Pqo fe (f j 

lim 4 — < hm 



a->0 log-B Q a^O log-Bo, 

. -lQgPoo, fc2 (^) * 

= mm hm — = r . 

i a-+Q log B a 

Taking the expectation with respect to A2, we can conclude that the results hold for the measure Poo,A 2 > 
since fc 2 only appears in either the denominator of the bound rates, or as I — fa, but for / > fc 2 . ■ 



F. Proof of Theorem 4 

We prove the first assertion. First notice that from Definition 7, Ai(a, fc 2 ) Q Ai(a). Also notice that 

Ai(a, fc 2 ) C Ai(a), so that Ai(a, k 2 ) C Aj(a) n Ai(a). Let 1/1 G Ai(a, fc 2 ), if fci < fc 2 : 

E fcl ,fc 2 [(^l - All) I/l > fcl = — . 

F fci,fc 2 (^1 > 

> j^-T^T^A^ > fcl) -7 fcl , fe2 (^l)). 
Jrfc l5 fe 2 (^i > fei J 

But P fel , fc2 (i/i > fci) = 1-Poo,oo0l < fcl) > for fci < fc 2 using Lemma 3(i), and Lemma 4(ii) 

shows that 7fc 1 ,fc 2 (z^i) — > uniformly over v\, so 

inf E kuk2 [(v 1 -k 1 r\v 1 >k 1 ]>((l-e)L 1 a ) m (l + o(l)) as a -> 0. 

A similar bound works for fc 2 < fci, except P^^^i > fci) = P o,fc 2 ( l/ i > fcl) > 1 — «/n„ for 
i/i G Ai(a, fc 2 ) (Lemma 3(h)). 

For the second statement, we note that: 

inf E AliA2 [(^i-Ai)^] > inf E Al)Aa [(n - Ai)™I(Ai < A 2 )] 

+ inf E AliA2 [(^-Ai)!pi(Ai >A 2 )] 

We can use Lemma 4 (i) and (iii) to bound such quantities in the same manner as in the first case. 
Lemma 3 (i) and (iii) can be used to bound the appropriate probabilities as before. 



G. Proof of Theorem 5 

We divide the proof into computing an upper bound (item (a)) and the lower bound (item (b)). First, we 
compute the upper bound in Lemma 5. Denote by v\ = v${X, Z) the stopping time given by Eqn (9). We 
would like to bound the expectation E Al)A2 [(^i — Ai) + ]. In order to do this we need Assumption VII.4. 
Assumption VII.4 is stronger than Assumption VII. 3, and in fact the later follows from the former [25]. 
We can proceed to prove the theorem, 
(a) Define: 

qf = qi {X) + qi {Z) + d 1 ,qf = qi(X) + d 1 , 
S a (ki,k 2 ) = P fcl) fc 2 (*/i > v 2 ),fjL a (kx,k 2 ) = Pfc^fe^i > h), 

$a =Pa 1 ,A 2 (i / 1 > V 2 ),Ha =P Al>A2 (V 1 > u{). 
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We start by analyzing the expectation of the stopping time, using the definition of u± and v 2 ; 

E fel)fe2 [(Pi - Ai)+] = E fcljfc2 [(y x - AO+%1 < v 2 )\ + 

+ E kuk2 [(Pi - > u 2 , Pi > v 2 )] + 

+ E klM [{y 2 - Ai) + I(z/i > v 2 > Pi)] 

Each expectation can be bounded individually. The first expectation is bounded by using Lemma 5, 
setting A = {uj G SI : vi(oj) < u 2 (uj)}: 

log(a) 



E fcl ,*: a [(^i - Ai)+%i GA)] m < 



9i 



log(a) 



< Kz)(l + o(l)) 
(l-(5«(fcl,*!2))(l + o(l)). 



For the remainder of the proof, let E denote E klj k 2 an d IP denote ^ki,k 2 - We return to the usual notation 
wherever necessary. Also, we show the results for the case m = 1 and the modifications for the case 
m < r are straightforward. The second expectation is bounded as: 



< E [(Pi - 


-Ai 


= E[(Pi - 


Ai) 


< E[(Pi - 


Ai) 


< E[(Pi - 


Ai) 


= E[(Pi - 


Ai) 


< E[(Pi - 


Ai) 



m (u 1 > Ai H -j— , z^i < v 2 ) 



qf 



(Pi > Ai + 



logB c 



Si 



qf 
qf 

OS ^ a (l - e a - <5 a (A;i, A; 2 )) 



(z/l > z/ 2 ) 



(Pi > Ai + 



qf 



lux > v 2 ) 



< 



Efc ll0 o[(Pi - Ai) 
log B a 



<?i 



(eQ + ^a^l,^))- 



Since (1) In third line we used P(A D B) > P(A) - P(B C ); (2) In fifth line, Pi does not depend on 



k 2 ; (3) P fcli00 (Pi > Ai + 
Lemma 5 (fifth statement). 
Finally, 



log -Bo 



> 1 — e a , by Lemma 4 (iv) and (4) Efc 1)0O [(Pi — Ai) ] is bounded by 



E [(u 2 - Ai)+ I(yi >u 2 > Pi)] < E[(i/i - Ai) + > Pi)] 

log^ 



< 



Si 



Vi > Pi)(l + o(l)) 



= l ^^^(h,k 2 )(l + o(l)). 
qf 

Where we used (1) v\ > v 2 in the first line and (2)in the second line, Lemma 5 (third assertion), setting 

A = {uj G S7 : vi(uj) < Pi(w)}. 
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In sum, we have: 

E[(ux-\x) + ] < ^^(l-S4ki,k 2 ) + f ,4ki,k 2 ))(l + o(l)) + ^^(e a + d a (kuk 2 )) 
q a q a 

_ logBa -8 a (k 1} k 2 ) + fi a (k±,k 2 ) + o(l)) + l0g ,f " (cq +5 a (ki,k 2 )). 
q a q a 

To obtain the delay, divide 

E AliA J(i7 1 -A 1 )+] < l0 ^(l-<5 a + / x Q + (l)) + l0 |^(e a + ^), 

q a q a 

by (using Theorem 2), 

EVa^i > Ai) > l- a -^ liAa (^) 
l-o(l) 

and we obtain the result in the Theorem since (1) e a and fj, a are o(l) (Lemmas 4(iv) and 6) and (2) 
£"1 a C^ 1 ) * s as tne P roce dure is regular. 

We can now prove the matching lower bound for the delay. 

(b) For the remainder of the proof, let E denote ^k u k 2 and P denote Pfc t fc 2 . First notice that: 

E [(Pi - Ai)+] = E [(yx - Ai)+%i < i/ 2 )] + E [(max(5i, i/ 2 ) - Ai)+I(i/i > ^)] 
> E [(^ - Ai) + I(z/i < i/*)] + E [fa - Ai) + Ifa > i^j)] . 
We can now bound the first term. 

E [{vi - Ai)+I(i/i < v 2 )] > 1% F(u! - kx > (1 - e)Lf, v x < v 2 ) 

= L\ \P(ux > fei A fc 2 , fl < v 2 ) ~ < z^i < fei, i>i < f 2 ) 
- P(&i < vx < h + (1 - e)Lf , z/i < i/ 2 )] 

> L? [Pfa > kx A fc 2 , ^ < ^) - £ 1)fca fa) " lt M) (vi)\ 

> L? [Pfa > fc! A fc 2 ) - Pfa > ^) - 4 Q lifc2 (^i) - T&^Vi)] 

= Lf [1 - P^fa < fc! A to) - - - 7ft' fc2 Vi)] 



« - S a (kx, k 2 ) - % i>k2 (vx) - l^\vx) 

1A fciAfc a 



Where in the (1) fourth line we get a lower bound since the subtracted probabilities, and we identify 
them with previous definitions; (2) fifth line we use P{A n B) > P{A) — P(B C ); (3)sixth line we use a 
change of measure and (4) seventh line we use Lemma 3(i). 

So if the procedure is (kx,k 2 ) regular, E \{y\ — \i) + I{vx < u 2 )] > Lf (1 — S a (kx,k 2 ) + o(l)). For 
the averaged case over the priors, the last line above should be replaced using the false alarm bound for 
vx • Notice that P(fi < kx A k 2 ) < Pfe lj0 o( I/ l — kx), and average the last statement over kx to obtain 
Pai .00(^1 < Ai) < a, so the last line is replaced by 

> L? [l-a-^-^W-Te,*)] 
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The second expectation can be bound similarly: 

E - Ai)+I(i/i > v 2 )] > Lf P(Pi - h > Lf, v x > v 2 ) 

> L« [F(h -h> Lf ) - P(^i < v 2 )] 

= LI [l-P(u 1 <k 1 )-^^\u 1 )-l + 6 a {k 1 ,k 2 )]. 

Finally, we use the trivial upper bound Pk u k 2 (^i > &i) < 1 — o(l) and take expectation with respect 
to Ai and A2 to obtain the result in the theorem. 
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Appendix 

A. Lemma 2 

Lemma 2. Consider the function f(x) = (a + b x) 2 / (c + d x), with a,c,d> 0. The following properties 
hold: 

(a) If b > and a/b > 2c/d, the function is decreasing in the interval x G [0, x m i n ] and increasing in 
x G (xmin, oo],where x m in = a/b— 2 c/ d is the point of minimum and f{x m in) = 4 b 2 /d(a/b—c/d). 

(a') lfb>0 and a/b < 2c/d, the function is increasing in the interval x G [0, oo), the point of minimum 
is x = 0, and f{x m i n ) = a 2 /a 

(b) If b < 0, the function is decreasing in the interval x G [0, x m in\ and increasing in x G (x m i n ,oc], 
where x m i n = —a/b is the point of minimum and f(x m i n ) = 0. 

Proof: Follows from noticing that the derivative is f'(x) = (a + b x)(2bc — ad + bdx) /(c + dx) 2 . 



B. Lemma 3 

Lemma 3. Let v\ he a valid stopping time such that u\ G J- n (X, Z). Consider the stopping rule classes 
in Definition 7. Then: 

(i) If vi G Ai(a), then for all n < k\ < Pfc^^i < n) < 

(ii) If v\ G Ai(a, k^), then for all n < k\: Pfc^fVi < n) < j^-. 

(iii) If vi G Ai(a), then for all n < k\: Pfc 1; A 2 (^i < n,X 2 < ki) < 

Proof: All assertions follow the same proof guideline. First notice that: 

{{ui < n}n{Xi > n}) 
= Paldo^i < n \ x i > n)V Xl)0 o(M > n) 
= Poo,oo(^i < n)Ii l n . 

Next, as v\{X,Z) G Ai(a), we have P f ^ 1 '°°(^ 1 ) < a. To conclude, for the choices of k\,k 2 and n in 
the lemma P fcli fc 2 (^i < n) = P 00 , o(i / i < n). ■ 

C. Lemma 4 

We state a basic Lemma that is used to bound probabilities of false alarm in a given class. Compare 
this to Lemma 1 in [25]. 
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Lemma 4. Define for all < e < 1: 

7 gx>M (l/1 ) = P fcijfca (fc 1 <^<A; 1 + (l-e)L?), 

7e,a(^i) = F Al ,A 2 (Ai<^i<Ai + (l-e)L?), 

76 > a(^i,A 2 < AO = PAx I A s (Ai<^<Ai + (l-e)Lf,A 2 <A0, 

T& ,fe) ("0 = Pfci,fe(fci <"1 <fci + (l-e)Lf), 

76,a(^) = ff , A 1 ,A 2 (Ai<z. 1 <Ai + (l-e)L?,A 2 <A0, 

7 (^M (z>1 ) = p faifa (AM<i> 1 <A:i + (l-e)Lf) 1 

7 e ,a(i>0 = PalA^Ai <z>i < Ai + (l-e)Z?). 

where d\ is given in Assumption VII. 1, L" and L" are g/ve/i fry Definition 6. Then for all k\,k% > 1 
anJ < e < 1: 

(i) lim^o sup„ l6Al ( a ) <ye k a k2 \vi) = 0, 
lim a ^o snp UieAi( ^ a) % t a(u 1 ) = 0, 
lim Q _ >0 sup JyigAi(Q )7 eia (^i,Ai < A 2 ) = 0, 

(ii) lim a ^o sup^eA^a.fe) 7e,a' fa) (>0 = °/ or fc i > fc 2, 
lim a ^ sup iyieAi(Q fc2) 7 e {fc Q 1 ' fc2) (^0 = Ofor k\ < k 2 , 

(in) lim a ^o su Pj, lGAl ( Q ) Te.aOO = °> 
(iv) lim Q ^ sup i>igAi(Q ) 7 e ( ,a' fc2) (z>0 = 0, 
lilTla-X) supj> ieAi ( Q) 7e,a(^0 = 0. 

An analogous result holds for v% belonging to classes A2(a)> A2(ct, fo) and A2(a). 
Proof: (i) We can first build our bound by a change of measure argument: 

Poo.oo (h <vx<kx + (l- e)X?) = 

= E fcl>fe {l ( kl < v < k x + (1 - e)L ? ) e-K 1 

■> F /lT c -(i?" 1 1 (X)+^ 1 lAfe2 (Z))\ 
^ fc fc*lV<"<*i+(l-^m+^(2)<C) e ^ ') 

>e- c F kuk (k 1 <v<k 1 + (l-e)L«, max + R k n ^(Z) < c] 

\ fe 1 <n<fei+(l-e)-Lf / 

> e- c [P fel , fe2 (h < v < h + (1 - e)L?) - 

P fcl , fc2 f max ^(X) + J R^(Z)>c) . 
Choosing C = (1 — e 2 )(c7i(X) + q%(Z))L®, and rearranging we obtain: 

7 (fa,fe0 < e (l-e=')( 9l (J>f)+, 1 (Z))L fPo0)0o < ^ < ^ + (1 _ e)L?) + (25) 

+ Pfe.faC m ax l#(X) + i# Afc "(Z)>c) 
\fci<n<fci+(l-e)if / 

We now analyze each of the two parts in the above. We start with the second term: 

/We, a) = Pfe.fef max + R^(Z) > c) 
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< F klM ( max R%(X) + R^(Z) > C + 

\fci<rKfci+(l-e)Lf / 

+ F fcl ,fc 2 [C-R z < max i# (X) + R% (Z) <C,R z >0 

k 1 <n<k 1 + (l-e)Lf 

< F klM [ max R%(X) + R%(Z)>c) + 
+ P fa)fcl (C-.Rz< max l^(X) + i^)<C 

fci<n<fci + (l-e)L5" 



ki,k 2 



max i# +n PO + i?^ +n (Z) > c] + 



+ P fcl)fe C-^< max + C 

1 0<n<(l— £)L° 



ki,k 2 



^ max i?£ +n (X) + > + 

-oS ^» (I) + i! » (2)<?f ) 



Where i? z = Hj^_ 1 (Z), & = (1 + e)(gi(X) + gi(Z)) and iV a = [(1 - e)ZfJ. Now noticing that as 
a — > we have iV a — >• oo, we have using assumption VII.3 and properties of measure: 



max i?^ +n (X) + R k k ) +n {Z) > q e ) 

N a 0<n<N a fcl+nA ' fel+nV ' ) 



l - max R k k ) +n {X) + i$ +n (Z) > <k \ ->• 0. 



AT, 

Because -8^ — )• almost surely, we have the second probability going to zero. Thus f3 klt j- 2 (e, a) — > 
as a — > 0. We now proceed to bound the first probability in Eq. (25), using the result from Lemma 3(i) 
and using the definition of N a and q = q\{X) + q\(Z): 

Pkuk2 (e,a) = e( 1 - e2 )^W +9l ( z » L fP 00i00 (A: 1 <z, 1 <A; 1 + (l-6)Lf) 

< e (l-e»)( ffl (J0+ 9l (^))Z,f PoO)Oo fa < ^ + (1 _ £ ) L «) 

< " Jl-e>)qL? 

Notice that a = e~( <?+rfl ) i ? from the definitions. Thus: 

log(p fcl , fa (e,a)) (l-e 2 )gLf + log^^ 



iV Q - iV a iV a jV a 



< 



N a N a h + N a N a 

[1 + e)q(N a + 1) _ ^iVa _ logn^ +jVa fc 1 + jV Q 
iVa N a k! + N a N a 

£ 2 q + di _ iogn^ i+JVa f _fc]\ (l + e)g 
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Taking limits, and using the tail assumption: 

Hm log(Pfci,fc 2 ( g >oQ) < e 2 g + di d = e 2 q + edg 
a->o N a ~ 1 - e 1 1 - e 

It is now clear that Pk ± ,k 2 ( e ^ a ) ~ > 0- We have shown that for all v\ G Ai(a): 

7ft' fc2) (^i) < /9fc 1( fe(e,«) + p fcll * a (e,a). -> 
We can complete the result by studying the behavior of j e a . Let N a = [eL"\. From the definition: 



oo oo 



k 1= l k 2 =l 

- n k + 7r i( A; i) 7r 2(fc2)(/3fc 1 ,fc 2 (e,a) +Pfc 1 ,fe 2 (e,a)) 

fc 1= i k 2 =i 

< + sup p fcl)fe (e,a) + ^2 ^2 7T i( k i) 7T 2(k2)l3k 1 ,k 2 (^ a )- 

ki<N a k 1= lk 2 = l 

Now as a — > 0, IXj~ — >• by definition, and the third term in the above sum goes to zero by 
Dominated Convergence Theorem and the fact that ^ fc 2 (e, a) — > 0. For the second term, we make a 
minor modification in the first proof of convergence of pk ± ,k 2 (e> &), by noticing that il* is a non-increasing 
function of n: 

sup P fel , fc2 (e,a) < ttT~~ e^ 1 "^?. 

ki<N a N a +N a 

Then continuing as before, replacing k\ by iV a , we obtain: 

log(sup fc <i y o Pfc„A 3 (e,a)) e 2 g + di / e \ e 2 g 

lim = < h d\ 1 H = . 

iV a - 1 - e V 1 - e / 1 - e 

Clearly this shows that sup fe < ^ r Pki,k 2 ( e i a ) ~* 0» concluding the proof. The proof for the third 
statement is the same the above, except the sum over the priors is only over the cases Ai < A2. 
(ii) The proof is as in (i), except we use the change of measure for &2 < k\\ 



■ oo, k 2 



k 2 (h <vx<kx + {l- e)Lfj = E kl)k2 {l (fei < v < h + (1 - e)L?) e"^ W)} . 



For &i < /c2 we use the same change of measure as in (i). We again can use Lemma 3(h). For the cases 
(iii) and (iv) the proofs proceed similarly. ■ 
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D. Lemma 5 

Lemma 5. Let the stopping time u\ = v(X,Z) defined in Eq. (9). If Assumption VII. 4, then as a — >■ 0, 
for all m < r, and all events A: 



E jfcl)fe J(i/ 1 -Ai)+] 
E XuX2 [(u 1 -X 1 )+] 



< [Ll] m (1 + 0(1)), 

< [L\] m (l + o(l)), 



E fel , fc2 [(^-Ai) + I(^i eA)] m < [4] m P ifcl)fe2 (^ie > A)(l + (l)) 5 

E^iih - Xi) + ] 
K Ai ,a 2 [(^i-Ai) + ] 



E fcljifc2 [(^-Ai)+I(i?i eA)Y 
EW^i-AO+I^ie.A)]' 
where L\ and are given in Definition 6. 



< 
< 

< 

< 

< 



[^] m PA 1 ,A 2 (^ie^)(i + o(i)), 

Li (l + o(l)), 
(l + o(l)), 

P felifc2 (z?ie^l)(l + (1)) ; 

F Al ,A 2 (i>i £^)(l + o(l)), 



Proof: By definition of i^, since we are using the SRP statistic: 



log(A n (X,Z)) > log 

— D n . 

We can define a stopping time: 



7Tl(fci 

ni 



+ R k ^(X) + R^{Z) 



fci , 



r 7 (fe 1 )=inf{n:5^ +ri - 1 >log(Ba)} 



Notice that z^i — fci < on v\ > k\, as r/(A;i) starts at k\ and the Shiryaev statistics only includes 

values in range {k\,n) after time k\. Define: 



f e (fcl) = sup \ n > 1 : 



S k k l +n _ l (X)-q 1 (X)+q 1 (Z) + di 



n 



> e 



Due to Assumption VII.4, and because ^ log ( 



d\ as n — > oo, we have E^^T^] < oo. 



Furthermore, from the definition of rj and setting = qi(X) + q\(Z) + d\. 

log(fl«) > SJk)-i > (ryCfci) - feiJCga - e) on [ v (k) - 1 > if')} 
We can bound for all < e < qy: 



v(h) < % + 1 ^: 



< T e (fcl) + 1 + A;i + 



g d -e {•»(*)-!>*. 

log(S a 



+ (^f + 1)1 



{r,(fc)-l<T e (fcl) } 



1d~ e 



So: 



^ - &i 



i + i 

+ ; tt^ + 



log(£ Q ) " log(5 a ) log(B 
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Letting e — > 0, noticing Kk 1: k 2 [T e ] < oo, and letting a — > (log(i? Q ,) — > oo) we obtain the first 
result in the Theorem for all m < r. Averaging over the priors, noticing K\ lt x 2 [Te ] < oo we obtain 
the second. For the third and fourth results, it suffices to notice that: 

{vi - fci)i(^ £ A) < rj fcl) | i + fei | g A) 

\og{B a ) ~ \og{B a ) \og(B a ) q d -e 

The proof follows along similar lines for v%. ■ 

E. Lemma 6 

Lemma 6. Let n a (ki,k2) = Pj^ k 2 iyi > ^i) an< ^ Ma = IP > Ai,a 2 ( 1/ i > ^l)> where v\ and v\ are given in 
Eqns. (9) and (10). 77ze?i (i a (ki, fe) = o(l) arcc? /i Q = o(l) m a 0. 

Proof: First, we note that 

Pfa.feOi > h) < ^k u k 2 (vi > h,f>i > h + L a ) +¥ kuk3 (i>i < h) +P kl ,k 2 (ki <vi< ki + L a ), 

and asymptotically, in a, the last two terms are o(l). Next, we follow along the lines of the first part 
of Lemma 1, to derive the result. Let P denote Pfci,fc 2 > ^PO = 0°6 (X) > log-B a } and T{L a ) = 
[ki + L a ,oo): 

oo 

P kl ,kM>^^i>ki + L a )< J2 W({logA l (X,Z)<logB a }n£(X),i) 1 = l) 

l=ki+L a 

< jr, p(|logA i (X)+ min Rf(Z) -loglli(Z) <lo gj B a | nS(X),h = lj 

< V f( max-Rf(Z)>-logIL 1 (!),v l = l) 
, tlx ^ se[1 ' l] ' 

l=k 1 +L a 

< r P(i> 1 = |)p f max -iff (Z) > - loglli(0^ 



Z=fci+L Q 



< max P max -Rf(Z) > -logili(Z) 
leZ(L a ) \se[i,i] 

< max I maxP(-i?f(Z) > -loglli(Z)) 



< max Z max exp 



(Vi+ldx — min(r, k\) qo{Z) + [l — max(r, k\) + \} + qi(Z)) 2 



Jex(il) " re[i,F] 1 lmax(ffg(2),^(Z)) J 

where Vi = — loglTi(Z) — Z di. Note that for Z > L for some L, \Vv\ < e due to Assumption VII. 1. Thus 
when r <k\, the maximum happens at r = k\, with rate upper bounded by 



T(l) 



( e + i dl _ kl q ^ Z ) + (l-k t + l)qi{Z)f 



I max(cj2(Z),^(Z)) 
Else, the maximum happens at r = Z, with rate upper bounded by 



r(Z) 



(e + Zdi+gi(Z)) 2 
Z maic(og(Z),(7?(Z)) 
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In both cases, for any I G [k\ + L a ,oo], r(l) — > oo as a — > 0. Thus we obtain ^^,^(^1 > > 
k\+L a ) = o(l). Since k\ only appears multiplying an exponentially small probability, as both rates go to 
infinity uniformly over k\, we can apply expectation to both sides, and obtain that Fai,a 2 { u i > ^i) = o(l), 
as E[ki] = Ai < oo. ■ 
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